Improved bound on the worst case complexity of Policy Iteration

نویسندگان

Romain Hollanders

Balázs Gerencsér

Jean-Charles Delvenne

Raphaël M. Jungers

چکیده

Solving Markov Decision Processes (MDPs) is a recurrent task in engineering. Even though it is known that solutions for minimizing the infinite horizon expected reward can be found in polynomial time using Linear Programming techniques, iterative methods like the Policy Iteration algorithm (PI) remain usually the most efficient in practice. This method is guaranteed to converge in a finite number of steps. Unfortunately, it is known that it may require an exponential number of steps in the size of the problem to converge. On the other hand, many open questions remain considering the actual worst case complexity. In this work, we provide the first improvement over the fifteen years old upper bound from Mansour & Singh (1999) by showing that PI requires at most k k−1 · kn n + o ( k n n ) iterations to converge, where n is the number of states of the MDP and k is the maximum number of actions per state. Perhaps more importantly, we also show that this bound is optimal for an important relaxation of the problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An improved infeasible‎ ‎interior-point method for symmetric cone linear complementarity‎ ‎problem

We present an improved version of a full Nesterov-Todd step infeasible interior-point method for linear complementarityproblem over symmetric cone (Bull. Iranian Math. Soc., 40(3), 541-564, (2014)). In the earlier version, each iteration consisted of one so-called feasibility step and a few -at most three - centering steps. Here, each iteration consists of only a feasibility step. Thus, the new...

متن کامل

An Interior Point Algorithm for Solving Convex Quadratic Semidefinite Optimization Problems Using a New Kernel Function

In this paper, we consider convex quadratic semidefinite optimization problems and provide a primal-dual Interior Point Method (IPM) based on a new kernel function with a trigonometric barrier term. Iteration complexity of the algorithm is analyzed using some easy to check and mild conditions. Although our proposed kernel function is neither a Self-Regular (SR) fun...

متن کامل

Improved Strong Worst-case Upper Bounds for MDP Planning

The Markov Decision Problem (MDP) plays a central role in AI as an abstraction of sequential decision making. We contribute to the theoretical analysis of MDP planning, which is the problem of computing an optimal policy for a given MDP. Specifically, we furnish improved strong worstcase upper bounds on the running time of MDP planning. Strong bounds are those that depend only on the number of ...

متن کامل

A path following interior-point algorithm for semidefinite optimization problem based on new kernel function

In this paper, we deal to obtain some new complexity results for solving semidefinite optimization (SDO) problem by interior-point methods (IPMs). We define a new proximity function for the SDO by a new kernel function. Furthermore we formulate an algorithm for a primal dual interior-point method (IPM) for the SDO by using the proximity function and give its complexity analysis, and then we sho...

متن کامل

A POLYNOMIAL TIME BRANCH AND BOUND ALGORITHM FOR THE SINGLE ITEM ECONOMIC LOT SIZING PROBLEM WITH ALL UNITS DISCOUNT AND RESALE

The purpose of this paper is to present a polynomial time algorithm which determines the lot sizes for purchase component in Material Requirement Planning (MRP) environments with deterministic time-phased demand with zero lead time. In this model, backlog is not permitted, the unit purchasing price is based on the all-units discount system and resale of the excess units is possible at the order...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Oper. Res. Lett.

دوره 44 شماره

صفحات -

تاریخ انتشار 2016

Improved bound on the worst case complexity of Policy Iteration

نویسندگان

چکیده

منابع مشابه

An improved infeasible‎ ‎interior-point method for symmetric cone linear complementarity‎ ‎problem

An Interior Point Algorithm for Solving Convex Quadratic Semidefinite Optimization Problems Using a New Kernel Function

Improved Strong Worst-case Upper Bounds for MDP Planning

A path following interior-point algorithm for semidefinite optimization problem based on new kernel function

A POLYNOMIAL TIME BRANCH AND BOUND ALGORITHM FOR THE SINGLE ITEM ECONOMIC LOT SIZING PROBLEM WITH ALL UNITS DISCOUNT AND RESALE

عنوان ژورنال:

اشتراک گذاری